Key Points
-
Genetic linkage analysis can be used as a tool for estimating the genetic distance between two loci.
-
In family data, a small recombination fraction between a hypothesized disease locus and a genetic marker is evidence of short distance between the two loci.
-
Linkage analysis is contrasted with family-based association analysis, in which unaffected family members serve as control individuals (in family-based association tests).
-
Single-nucleotide variants (SNVs) generated by whole-genome sequencing (WGS) can be used in linkage analysis.
-
We describe various linkage algorithms and their properties, as well as their implementations.
-
A detailed enumeration of the pertinent steps in linkage analysis provides a guideline for non-specialists on procedures and pitfalls.
Abstract
For many years, linkage analysis was the primary tool used for the genetic mapping of Mendelian and complex traits with familial aggregation. Linkage analysis was largely supplanted by the wide adoption of genome-wide association studies (GWASs). However, with the recent increased use of whole-genome sequencing (WGS), linkage analysis is again emerging as an important and powerful analysis method for the identification of genes involved in disease aetiology, often in conjunction with WGS filtering approaches. Here, we review the principles of linkage analysis and provide practical guidelines for carrying out linkage studies using WGS data.
Similar content being viewed by others
References
McClellan, J. & King, M. C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).
Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).
Pulst, S. M. Genetic linkage analysis. Arch. Neurol. 56, 667–672 (1999).
Altshuler, D., Daly, M. J. & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).
Bailey-Wilson, J. E. & Wilson, A. F. Linkage analysis in the next-generation sequencing era. Hum. Hered. 72, 228–236 (2011).
Terwilliger, J. D. & Ott, J. Handbook of Human Genetic Linkage (Johns Hopkins Univ. Press, 1994).
Ott, J. Analysis of Human Genetic Linkage (Johns Hopkins Univ. Press, 1999).
Lange, K. Mathematical and Statistical Methods for Genetic Analysis (Springer, 2002).
Mendel, G. J. Versuche über Pflanzen-Hybriden. Verh. Naturforsch. Ver. Brünn 4, 3–47 (in German) (1866).
Santos-Cortez, R. L. et al. Mutations in KARS, encoding lysyl-tRNA synthetase, cause autosomal-recessive nonsyndromic hearing impairment DFNB89. Am. J. Hum. Genet. 93, 132–140 (2013).
Goldschmidt, R. Gen und Ausseneigenschaft (Untersuchungen an Drosophila) I. Z. Indukt Abstamm Vererbungsl 69, 38–69 (in German) (1935).
Goldschmidt, R. B. Phenocopies. Sci. Am. 181, 46–49 (1949).
Laird, N. M. & Lange, C. Family-based designs in the age of large-scale gene-association studies. Nature Rev. Genet. 7, 385–394 (2006).
Laird, N. M. & Lange, C. Family-based methods for linkage and association analysis. Adv. Genet. 60, 219–252 (2008).
Ott, J., Kamatani, Y. & Lathrop, M. Family-based designs for genome-wide association studies. Nature Rev. Genet. 12, 465–474 (2011).
Weiss, K. M. & Clark, A. G. Linkage disequilibrium and the mapping of complex human traits. Trends Genet. 18, 19–24 (2002).
Ott, J. & Wang, J. Multiple phenotypes in genome-wide genetic mapping studies. Protein Cell 2, 519–522 (2011).
Sasieni, P. D. From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261 (1997). A clear description of how population substructure leads to deviation from Hardy–Weinberg equilibrium and, consequently, to false-positive evidence of allelic association.
Falk, C. T. & Rubinstein, P. Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Ann. Hum. Genet. 51, 227–233 (1987).
Terwilliger, J. D. & Ott, J. A haplotype-based 'haplotype relative risk' approach to detecting allelic associations. Hum. Hered. 42, 337–346 (1992).
Ott, J. Statistical properties of the haplotype relative risk. Genet. Epidemiol. 6, 127–130 (1989).
Spielman, R. S., McGinnis, R. E. & Ewens, W. J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52, 506–516 (1993). The derivation of the highly successful TDT as a test for linkage and association.
He, Z. et al. Rare-variant extensions of the transmission disequilibrium test: application to autism exome sequence data. Am. J. Hum. Genet. 94, 33–46 (2014).
De, G., Yip, W. K., Ionita-Laza, I. & Laird, N. Rare variant analysis for family-based design. PLoS ONE 8, e48495 (2013).
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008). The first derivation of collapsing methods for rare variants, leading to what is now known as burden tests.
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).
MacArthur, D. G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).
Haldane, J. B. S. & Crew, F. A. E. Change of linkage in poultry with age. Nature 115, 641 (1925).
Renwick, J. H. & Schulze, J. Male and female recombination fractions for the nail-patella:ABO linkage in man. Ann. Hum. Genet. 28, 37992 (1965).
Elston, R. C., Lange, E. & Namboodiri, K. K. Age trends in human chiasma frequencies and recombination fractions. II. Method for analyzing recombination fractions and applications to the ABO:nail-patella linkage. Am. J. Hum. Genet. 28, 69–76 (1976).
Tanzi, R. E. et al. A genetic linkage map of human chromosome 21: analysis of recombination as a function of sex and age. Am. J. Hum. Genet. 50, 551–558 (1992).
Shi, Q. et al. Absence of age effect on meiotic recombination between human X and Y chromosomes. Am. J. Hum. Genet. 71, 254–261 (2002).
Kostic, V. S. et al. Intrafamilial phenotypic and genetic heterogeneity of dystonia. J. Neurol. Sci. 250, 92–96 (2006).
Gusella, J. F. et al. A polymorphic DNA marker genetically linked to Huntington's disease. Nature 306, 234–238 (1983).
Lee, J. M. et al. CAG repeat expansion in Huntington disease determines age at onset in a fully dominant fashion. Neurology 78, 690–695 (2012).
Ott, J. & Falk, C. T. Epistatic association and linkage analysis in human families. Hum. Genet. 62, 296–300 (1982).
Ott, J. in Genetic Approaches to Mental Disorders (eds Gershon, E. S. & Cloninger, C. R.) 63–75 (American Psychiatric Press, 1994).
Renwick, J. H. & Schulze, J. A computer program for the processing of linkage data from large pedigrees. Excerpta Med. Int. Congr Ser. 32, E145 (1961).
Elston, R. C. & Stewart, J. A general model for the genetic analysis of pedigree data. Hum. Hered. 21, 523–542 (1971). A recursive method of likelihood calculation in large pedigrees, now known as the Elston–Stewart algorithm. It formed the basis for modern linkage analysis.
Elston, R. C., George, V. T. & Severtson, F. The Elston–Stewart algorithm for continuous genotypes and environmental factors. Hum. Hered. 42, 16–27 (1992).
Ott, J. Estimation of the recombination fraction in human pedigrees: efficient computation of the likelihood for human linkage studies. Am. J. Hum. Genet. 26, 588–597 (1974). The first generally available linkage program for large pedigrees, LIPED.
Cheung, C. Y., Marchani Blue, E. & Wijsman, E. M. A statistical framework to guide sequencing choices in pedigrees. Am. J. Hum. Genet. 94, 257–267 (2014).
Lander, E. S. & Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl Acad. Sci. USA 84, 2363–2367 (1987).
Lathrop, G. M., Lalouel, J. M., Julier, C. & Ott, J. Strategies for multilocus linkage analysis in humans. Proc. Natl Acad. Sci. USA 81, 3443–3446 (1984).
Cottingham, R. W. Jr., Idury, R. M. & Schaffer, A. A. Faster sequential genetic linkage computations. Am. J. Hum. Genet. 53, 252–263 (1993).
Kruglyak, L., Daly, M. J., Reeve-Daly, M. P. & Lander, E. S. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet. 58, 1347–1363 (1996).
Abecasis, G. R., Cherny, S. S., Cookson, W. O. & Cardon, L. R. MERLIN — rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genet. 30, 97–101 (2002).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Wang, G. T., Peng, B. & Leal, S. M. Variant association tools for quality control and analysis of large-scale sequence and genotyping array data. Am. J. Hum. Genet. 94, 770–783 (2014).
Wang, G. T., Zhang, D., Li, B., Dai, H. & Leal, S. M. Collapsed haplotype pattern method for linkage analysis of next generation sequence data. Eur. J. Hum. Genet. (in the press).
Thomas, D. C. & Cortessis, V. A. Gibbs sampling approach to linkage analysis. Hum. Hered. 42, 63–76 (1992).
Heath, S. C. Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61, 748–760 (1997).
Sobel, E., Sengul, H. & Weeks, D. E. Multipoint estimation of identity-by-descent probabilities at arbitrary positions among marker loci on general pedigrees. Hum. Hered. 52, 121–131 (2001).
Penrose, L. S. The detection of autosomal linkage in data which consist of pairs of brothers and sisters of unspecified parentage. Ann. Eugen. 6, 133–138 (1935).
Knapp, M., Seuchter, S. A. & Baur, M. P. Two-locus disease models with two marker loci: the power of affected-sib-pair tests. Am. J. Hum. Genet. 55, 1030–1041 (1994).
Whittemore, A. S. & Halpern, J. A class of tests for linkage using affected pedigree members. Biometrics 50, 118–127 (1994).
Basu, S., Stephens, M., Pankow, J. S. & Thompson, E. A. A likelihood-based trait-model-free approach for linkage detection of binary trait. Biometrics 66, 205–213 (2010).
Knapp, M., Seuchter, S. A. & Baur, M. P. Linkage analysis in nuclear families. 2: relationship between affected sib-pair tests and lod score analysis. Hum. Hered. 44, 44–51 (1994).
Su, M. & Thompson, E. A. Computationally efficient multipoint linkage analysis on extended pedigrees for trait models with two contributing major loci. Genet. Epidemiol. 36, 602–611 (2012).
Dietter, J. et al. Efficient two-trait-locus linkage analysis through program optimization and parallelization: application to hypercholesterolemia. Eur. J. Hum. Genet. 12, 542–550 (2004).
Schaffer, A. A. Digenic inheritance in medical genetics. J. Med. Genet. 50, 641–652 (2013).
Schork, N. J., Boehnke, M., Terwilliger, J. D. & Ott, J. Two-trait-locus linkage analysis: a powerful strategy for mapping complex genetic traits. Am. J. Hum. Genet. 53, 1127–1136 (1993).
Sham, P. C., MacLean, C. J. & Kendler, K. S. Two-locus versus one-locus LODs for complex traits. Am. J. Hum. Genet. 55, 855–858 (1994).
Smith, C. A. B. The detection of linkage in human genetics. J. R. Statist. Soc. Series B (Methodol.) 15, 153–192 (1953).
Powell, J. E., Visscher, P. M. & Goddard, M. E. Reconciling the analysis of IBD and IBS in complex trait studies. Nature Rev. Genet. 11, 800–805 (2010).
Kamphans, T. et al. Filtering for compound heterozygous sequence variants in non-consanguineous pedigrees. PLoS ONE 8, e70151 (2013).
Dubay, C. et al. Genetic determinants of diastolic and pulse pressure map to different loci in Lyon hypertensive rats. Nature Genet. 3, 354–357 (1993).
Hasstedt, S. J., Hanis, C. L. & Elbein, S. C. Univariate and bivariate linkage analysis identifies pleiotropic loci underlying lipid levels and type 2 diabetes risk. Ann. Hum. Genet. 74, 308–315 (2010).
Amos, C. I. et al. An approach to the multivariate analysis of high-density-lipoprotein cholesterol in a large kindred: the Bogalusa Heart Study. Genet. Epidemiol. 3, 255–267 (1986).
Allison, D. B. et al. Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am. J. Hum. Genet. 63, 1190–1201 (1998).
Ott, J. & Rabinowitz, D. A principal-components approach based on heritability for combining phenotype information. Hum. Hered. 49, 106–111 (1999).
Suo, C. et al. Analysis of multiple phenotypes in genome-wide genetic mapping studies. BMC Bioinformatics 14, 151 (2013).
Doyle, A. E. et al. Multivariate genomewide linkage scan of neurocognitive traits and ADHD symptoms: suggestive linkage to 3q13. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 147B, 1399–1411 (2008).
Houwen, R. H. et al. Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis. Nature Genet. 8, 380–386 (1994).
Smigielski, E. M., Sirotkin, K., Ward, M. & Sherry, S. T. dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 28, 352–355 (2000).
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Genomes Project, C. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Smith, K. R. et al. Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes. Genome Biol. 12, R85 (2011).
Li, B. et al. A likelihood-based framework for variant calling and de novo mutation detection in families. PLoS Genet. 8, e1002944 (2012).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). A description of the widely used GATK tool for analysis of WGS data.
Bentley, D. et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Brzustowicz, L. M. et al. Molecular and statistical approaches to the detection and correction of errors in genotype databases. Am. J. Hum. Genet. 53, 1137–1145 (1993).
Ott, J. Detecting marker inconsistencies in human gene mapping. Hum. Hered. 43, 25–30 (1993).
Gordon, D., Leal, S. M., Heath, S. C. & Ott, J. An analytic solution to single nucleotide polymorphism error-detection rates in nuclear families: implications for study design. Pac. Symp. Biocomput. 2, 663–674 (2000).
Cheung, C. Y., Thompson, E. A. & Wijsman, E. M. Detection of Mendelian consistent genotyping errors in pedigrees. Genet. Epidemiol. 38, 291–299 (2014).
Neale, M. C., Neale, B. M. & Sullivan, P. F. Nonpaternity in linkage studies of extremely discordant sib pairs. Am. J. Hum. Genet. 70, 526–529 (2002).
Hodge, S. E., Vieland, V. J. & Greenberg, D. A. HLODs remain powerful tools for detection of linkage in the presence of genetic heterogeneity. Am. J. Hum. Genet. 70, 556–559 (2002).
Santos-Cortez, R. L. et al. Adenylate cyclase 1 (ADCY1) mutations cause recessive hearing impairment in humans and defects in hair cell function and hearing in zebrafish. Hum. Mol. Genet. 23, 3289–3298 (2014).
Yan, J. et al. Combined linkage analysis and exome sequencing identifies novel genes for familial goiter. J. Hum. Genet. 58, 366–377 (2013).
Louis-Dit-Picard, H. et al. KLHL3 mutations cause familial hyperkalemic hypertension by impairing ion transport in the distal nephron. Nature Genet. 44, 456–460 (2012).
Hoffmann, K. & Lindner, T. H. easyLINKAGE-Plus — automated linkage analyses using large-scale SNP data. Bioinformatics 21, 3565–3567 (2005).
Lathrop, G. M., Lalouel, J. M., Julier, C. & Ott, J. Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am. J. Hum. Genet. 37, 482–498 (1985).
Heath, S. C., Snow, G. L., Thompson, E. A., Tseng, C. & Wijsman, E. M. MCMC segregation and linkage analysis. Genet. Epidemiol. 14, 1011–1016 (1997).
Lange, K. et al. Mendel: the Swiss army knife of genetic analysis programs. Bioinformatics 29, 1568–1570 (2013).
Lange, K., Weeks, D. & Boehnke, M. Programs for pedigree analysis: MENDEL, FISHER, and dGENE. Genet. Epidemiol. 5, 471–472 (1988).
Schaffer, A. A., Lemire, M., Ott, J., Lathrop, G. M. & Weeks, D. E. Coordinated conditional simulation with SLINK and SUP of many markers linked or associated to a trait in large pedigrees. Hum. Hered. 71, 126–134 (2011).
O'Connell, J. R. & Weeks, D. E. PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am. J. Hum. Genet. 63, 259–266 (1998).
Gertz, E. M. et al. PSEUDOMARKER 2.0: efficient computation of likelihoods using NOMAD. BMC Bioinformatics 15, 47 (2014).
Fishelson, M. & Geiger, D. Exact genetic linkage computations for general pedigrees. Bioinformatics 18, S189–S198 (2002).
Lander, E. & Kruglyak, L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nature Genet. 11, 241–247 (1995). The derivation of the critical LOD score of 3.3 for a significance level of 0.05 in genome-scan linkage analysis.
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. in Current Protocols in Human Genetics (eds Haines, J. L. et al.) Ch. 7.20 (Wiley, 2013).
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Acknowledgements
This work was supported by the Natural Science Foundation of China grant 31470070 (to J.O.) and the US National Institutes of Health grants R01 DC003594, R01 DC011651 and U54 HG006493 (to S.M.L.).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
DATABASES
FURTHER INFORMATION
Glossary
- Genetic mapping
-
The ordering of loci on a chromosome and the determination of the distances between two adjacent loci. For short distances, the recombination fraction can serve as a measure of genetic distance, with the unit of measurement being the centimorgan (cM); 1 cM = 1% recombination fraction.
- Genetic linkage
-
A phenomenon whereby two alleles, one each at two different loci, are transmitted together from parents to offspring more often than expected by chance. It leads to a recombination fraction smaller than 0.5.
- Phenocopies
-
Individuals that exhibit the phenotype of a Mendelian trait but that are not carriers of a susceptible genotype. Phenocopies were thought to result from non-genetic factors, but genes at locations other than those under current consideration can also lead to (genetic) phenocopies.
- Penetrance
-
The conditional probability of being affected given one of the genotypes at the disease locus, '+ +', '+ d' or 'dd', where 'd' is the disease allele and '+' the non-disease (wild-type) allele. More generally, penetrance is the conditional probability of a phenotype given a genotype.
- Recombination
-
Two alleles, one from each of two loci, can be inherited from one parent but originate from two different grandparents. If the two marker loci are on the same chromosome, a recombination is the result of an odd number of crossovers between the markers.
- Crossing over
-
A cytogenetic phenomenon that occurs during the formation of human gametes (egg or sperm cells). The salient feature of crossing over is that it occurs semi-randomly along chromosomes, with at least one crossover occurring on each chromosome in meiosis.
- Recombination fraction
-
(θ). The expected proportion of recombinant children divided by the total number of recombinant and non-recombinant children. For two loci in close proximity to each other, θ is small owing to the randomness of crossing over, but it increases to 0.5 for loci that are far apart.
- LOD score
-
Z(x) = log10[L(x)/L(∞)] is the logarithm of the likelihood ratio, with the numerator being calculated under the assumption of linkage and the denominator under no linkage. A LOD score of 3.3 or higher has been shown to correspond to a genome-wide significance level of 0.05.
- Mendelian inheritance model
-
The Mendelian laws of inheritance, when applied to variants, stipulate that an individual carries two copies (alleles) of a given nucleotide and passes one of them at random to each of their offspring. Disease may be the result of a single copy of the allele (dominant inheritance) or of two copies (recessive inheritance) in an individual.
Rights and permissions
About this article
Cite this article
Ott, J., Wang, J. & Leal, S. Genetic linkage analysis in the age of whole-genome sequencing. Nat Rev Genet 16, 275–284 (2015). https://doi.org/10.1038/nrg3908
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg3908
- Springer Nature Limited
This article is cited by
-
The complementary roles of genome-wide approaches in identifying genes linked to an inherited risk of colorectal cancer
Hereditary Cancer in Clinical Practice (2023)
-
The Genetic Basis of Moyamoya Disease
Translational Stroke Research (2022)
-
Genotyping by sequencing-based linkage map construction and identification of quantitative trait loci for yield-related traits and oil content in Jatropha (Jatropha curcas L.)
Molecular Biology Reports (2022)
-
Identification of autosomal recessive nonsyndromic hearing impairment genes through the study of consanguineous and non-consanguineous families: past, present, and future
Human Genetics (2022)
-
A powerful new method for rare-variant analysis of quantitative traits in families
European Journal of Human Genetics (2020)