Design Considerations for Genetic Linkage and Association Studies

Nsengimana, Jérémie; Bishop, D. Timothy

doi:10.1007/978-1-61779-555-8_13

Jérémie Nsengimana⁴ &
D. Timothy Bishop⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 850))

5014 Accesses
15 Citations
1 Altmetric

Abstract

This chapter describes the main issues that genetic epidemiologists usually consider in the design of linkage and association studies. For linkage, we briefly consider the situation of rare, highly penetrant alleles showing a disease pattern consistent with Mendelian inheritance investigated through parametric methods in large pedigrees or with autozygosity mapping in inbred families, and we then turn our focus to the most common design, affected sibling pairs, of more relevance for common, complex diseases. Theoretical and more practical power and sample size calculations are provided as a function of the strength of the genetic effect being investigated. We also discuss the impact of other determinants of statistical power such as disease heterogeneity, pedigree, and genotyping errors, as well as the effect of the type and density of genetic markers. Linkage studies should be as large as possible to have sufficient power in relation to the expected genetic effect size. Segregation analysis, a formal statistical technique to describe the underlying genetic susceptibility, may assist in the estimation of the relevant parameters to apply, for instance. However, segregation analyses estimate the total genetic component rather than a single-locus effect. Locus heterogeneity should be considered when power is estimated and at the analysis stage, i.e. assuming smaller locus effect than the total genetic component from segregation studies. Disease heterogeneity should be minimised by considering subtypes if they are well defined or by otherwise collecting known sources of heterogeneity and adjusting for them as covariates; the power will depend upon the relationship between the disease subtype and the underlying genotypes. Ultimately, identifying susceptibility alleles of modest effects (e.g. RR≤1.5) requires a number of families that seem unfeasible in a single study. Meta-analysis and data pooling between different research groups can provide a sizeable study, but both approaches require even a higher level of vigilance about locus and disease heterogeneity when data come from different populations. All necessary steps should be taken to minimise pedigree and genotyping errors at the study design stage as they are, for the most part, due to human factors. A two-stage design is more cost-effective than one stage when using short tandem repeats (STRs). However, dense single-nucleotide polymorphism (SNP) arrays offer a more robust alternative, and due to their lower cost per unit, the total cost of studies using SNPs may in the future become comparable to that of studies using STRs in one or two stages. For association studies, we consider the popular case–control design for dichotomous phenotypes, and we provide power and sample size calculations for one-stage and multistage designs. For candidate genes, guidelines are given on the prioritisation of genetic variants, and for genome-wide association studies (GWAS), the issue of choosing an appropriate SNP array is discussed. A warning is issued regarding the danger of designing an underpowered replication study following an initial GWAS. The risk of finding spurious association due to population stratification, cryptic relatedness, and differential bias is underlined. GWAS have a high power to detect common variants of high or moderate effect. For weaker effects (e.g. relative risk<1.2), the power is greatly reduced, particularly for recessive loci. While sample sizes of 10,000 or 20,000 cases are not beyond reach for most common diseases, only meta-analyses and data pooling can allow attaining a study size of this magnitude for many other diseases. It is acknowledged that detecting the effects from rare alleles (i.e. frequency<5%) is not feasible in GWAS, and it is expected that novel methods and technology, such as next-generation resequencing, will fill this gap. At the current stage, the choice of which GWAS SNP array to use does not influence the power in populations of European ancestry. A multistage design reduces the study cost but has less power than the standard one-stage design. If one opts for a multistage design, the power can be improved by jointly analysing the data from different stages for the SNPs they share. The estimates of locus contribution to disease risk from genome-wide scans are often biased, and relying on them might result in an underpowered replication study. Population structure has so far caused less spurious associations than initially feared, thanks to systematic ethnicity matching and application of standard quality control measures. Differential bias could be a more serious threat and must be minimised by strictly controlling all the aspects of DNA acquisition, storage, and processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lee-Kirsch M A, et al (2006) Familial chilblain lupus, a monogenic form of cutaneous lupus erythematosus, maps to chromosome 3p. Amer J Hum Genet 79: 731–737
Article PubMed CAS Google Scholar
Kruglyak L, et al (1996) Parametric and nonparametric linkage analysis: A unified multipoint approach. Amer J Hum Genet 58: 1347–1363
PubMed CAS Google Scholar
Lander ES, Botstein D (1987) Homozygosity Mapping - a Way to Map Human Recessive Traits with the DNA of Inbred Children. Science 236: 1567–1570
Article PubMed CAS Google Scholar
Mueller RF, Bishop DT (1993) Autozygosity Mapping, Complex Consanguinity, and Autosomal Recessive Disorders. J Med Genet 30: 798–799
Article PubMed CAS Google Scholar
Wang S, Haynes C, Barany F, Ott, J (2009) Genome-Wide Autozygosity Mapping in Human Populations. Genet Epidemiol 33: 172–180
Article PubMed Google Scholar
Boehnke M (1986) Estimating the Power of a Proposed Linkage Study - a Practical Computer-Simulation Approach. Amer J Hum Genet 39: 513–527
PubMed CAS Google Scholar
Ploughman LM, Boehnke M (1989) Estimating the Power of a Proposed Linkage Study for a Complex Genetic Trait. Amer J Hum Genet 44: 543–551
PubMed CAS Google Scholar
Samani N J, et al. (2005) A genomewide linkage study of 1,933 families affected by premature coronary artery disease: The British heart foundation (BHF) family heart study. Amer J Hum Genet 77: 1011–1020
Article PubMed Google Scholar
Whittemore AS, Tu IP (1998) Simple, robust linkage tests for affected sibs. Amer J Hum Genet 62: 1228–1242
Article PubMed CAS Google Scholar
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases, Science 273: 1516–1517
Article PubMed CAS Google Scholar
Risch N (1990) Linkage Strategies for Genetically Complex Traits.2. The Power of Affected Relative Pairs. Amer J Hum Genet 46: 229–241
PubMed CAS Google Scholar
Lander E, Kruglyak L (1995) Genetic Dissection of Complex Traits - Guidelines for Interpreting and Reporting Linkage Results, Nature Genetics 11: 241–247
Article PubMed CAS Google Scholar
Bishop DT, Williamson JA (1990) The Power of Identity-by-State Methods for Linkage Analysis. Amer J Hum Genet 46: 254–265
PubMed CAS Google Scholar
Risch NJ (2000) Searching for genetic determinants in the new millennium, Nature 405: 847–856
Article PubMed CAS Google Scholar
Brown BD, et al (2010) An evaluation of inflammatory gene polymorphisms in sibships discordant for premature coronary artery disease: the GRACE-IMMUNE study, BMC Medicine 8: 5
Article PubMed Google Scholar
Hodge SE, Vieland VJ, Greenberg DA (2002) HLODs remain powerful tools for detection of linkage in the presence of genetic heterogeneity. Amer J Hum Genet 70: 556–558
Article PubMed Google Scholar
Whittemore AS, Halpern J (2001) Problems in the definition, interpretation, and evaluation of genetic heterogeneity. Amer J Hum Genet 68: 457–65
Article PubMed CAS Google Scholar
Altmuller J, et al (2001) Genomewide scans of complex human diseases: True linkage is hard to find. Amer J Hum Genet 69: 936–50
Article PubMed CAS Google Scholar
Hauser ER, et al. (2004) Ordered subset analysis in genetic linkage mapping of complex traits. Genet Epidemiol 27: 53–63
Article PubMed Google Scholar
Nsengimana J, et al (2007) Enhanced linkage of a locus on chromosome 2 to premature coronary artery disease in the absence of hypercholesterolemia. Eur J Hum Genet 15: 313–319
Article PubMed CAS Google Scholar
Almasy L, Blangero J (2009) Human QTL linkage mapping. Genetica 136: 333–340
Article PubMed CAS Google Scholar
Abecasis GR, Cherny SS, and Cardon LR (2001) The impact of genotyping error on family-based analysis of quantitative traits. Eur J Hum Genet 9: 130–134
Article PubMed CAS Google Scholar
Abecasis GR, et al (2001) GRR: graphical representation of relationship errors. Bioinformatics 17: 742–743
Article PubMed CAS Google Scholar
Pompanon F, et al (2005) Genotyping errors: Causes, consequences and solutions, Nat Rev Genet 6: 847–859
Article PubMed CAS Google Scholar
Chang YPC, et al (2006) The impact of data quality on the identification of complex disease genes: experience from the Family Blood Pressure Program. Eur J Hum Genet 14: 469–477
Article PubMed CAS Google Scholar
Goring HHH, OttJ (1997) Relationship estimation in affected rib pair analysis of late-onset diseases. Eur J Hum Genet 5: 69–77
Google Scholar
Boehnke M, Cox NJ (1997) Accurate inference of relationships in sib-pair linkage studies. Amer J Hum Genet 61: 423–429
Article PubMed CAS Google Scholar
Douglas JA, Boehnke M, Lange K (2000) A multipoint method for detecting genotyping errors and mutations in sibling-pair linkage data. Amer J Hum Genet 66: 1287–1297
Article PubMed CAS Google Scholar
Sun L, Wilder K, McPeek MS (2002) Enhanced pedigree error detection. Hum Hered 54: 99–110
Article PubMed Google Scholar
Sobel E, Papp JC, Lange K (2002) Detection and integration of genotyping errors in statistical genetics. Amer J Hum Genet 70: 496–508
Article PubMed Google Scholar
Ray A, Weeks DE (2008) Relationship uncertainty linkage statistics (RULS): Affected relative pair statistics that model relationship uncertainty. Genet Epidemiol 32: 313–324
Article PubMed Google Scholar
Hauser ER, et al. (1996) Affected-sib-pair interval mapping and exclusion for complex genetic traits: Sampling considerations. Genet Epidemiol 13: 117–137
Article PubMed CAS Google Scholar
Sawcer SJ, et al (2004) Enhancing linkage analysis of complex disorders: an evaluation of high-density genotyping. Hum Mol Genet 13: 1943–1949
Article PubMed Google Scholar
Evans DM, Cardon LR (2004) Guidelines for genotyping in genomewide linkage studies: Single-nucleotide-polymorphism maps versus microsatellite maps. Amer J Hum Genet 75: 687–692
Article PubMed CAS Google Scholar
Guo XQ, Elston RC (2000) Two-stage global search designs for linkage analysis II: Including discordant relative pairs in the study. Genet Epidemiol 18: 111–27
Article PubMed CAS Google Scholar
Huang QQ, Shete S, Amos CI (2004) Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis. Amer J Hum Genet 75: 1106–1112
Article PubMed CAS Google Scholar
Schaid DJ, et al (2004) Comparison of microsatellites versus single-nucleotide polymorphisms in a genome linkage screen for prostate cancer-susceptibility loci. Am J Hum Genet 75: 948–65
Article PubMed CAS Google Scholar
Nsengimana J, Renard H, Goldgar D (2005) Linkage analysis of complex diseases using microsatellites and single-nucleotide polymorphisms: application to alcoholism. BMC Genet 6: S10
Article PubMed Google Scholar
Wilcox MA, et al (2005) Comparison of single-nucleotide polymorphisms and microsatellite markers for linkage analysis in the COGA and simulated data sets for genetic analysis workshop 14. Genet Epidemiol 29: S7-S28
Article PubMed Google Scholar
Boyles AL, et al (2005) Linkage disequilibrium inflates type I error rates in multipoint linkage analysis when parental genotypes are missing. Hum Hered 59: 220–227
Article PubMed Google Scholar
Abecasis GR, Wigginton JE (2005) Handling marker-marker linkage disequilibrium: Pedigree analysis with clustered markers. Am J Hum Genet 77: 754–67
Article PubMed CAS Google Scholar
Kurbasic A, Hossjer O (2008) A general method for linkage disequilibrium correction for multipoint linkage and association. Genet Epidemiol 32: 647–57
Article PubMed Google Scholar
Webb EL, Sellick GS, Houlston RS (2005) SNPLINK: multipoint linkage analysis of densely distributed SNP data incorporating automated linkage disequilibrium removal. Bioinformatics 21: 3060–3061
Article PubMed CAS Google Scholar
Fukuda Y, et al (2009) SNP HiTLink: a high-throughput linkage analysis system employing dense SNP data. BMC Bioinformatics 10: 121
Article PubMed CAS Google Scholar
Selmer KK, et al (2009) Genome-wide Linkage Analysis with Clustered SNP Markers. J Biomol Screen 14: 92–96
Article PubMed CAS Google Scholar
Fischer ANM, et al (2010) A genome-wide linkage analysis in 181 German sarcoidosis families using clustered bi-allelic markers. Chest 138: 151–157
Article PubMed Google Scholar
Guo XQ, Elston RC (2000) Two-stage global search designs for linkage analysis I: Use of the mean statistic for affected sib pairs. Genet Epidemiol 18: 97–110
Article PubMed CAS Google Scholar
Ochs-Balcom HM, et al (2010) Program update and novel use of the DESPAIR program to design a genome-wide linkage study using relative pairs. Hum Hered 69: 45–51
Article PubMed CAS Google Scholar
Purcell S, Cherny SS, Sham PC (2003) Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19: 149–150
Article PubMed CAS Google Scholar
WTCCC. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678
Article Google Scholar
Bishop DT, et al (2009) Genome-wide association study identifies three loci associated with melanoma risk. Nat Genet 41: 920–925
Article PubMed CAS Google Scholar
Panoutsopoulou KZE (2009) Finding common susceptibility variants for complex disease: past, present and future. Brief Funct Genomic Proteomic 8: 345–352
Article PubMed Google Scholar
Spencer CCA, et al (2009) Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLOS Genetics 5: e1000477
Article PubMed Google Scholar
McCarthy MI, et al (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9: 356–369
Article PubMed CAS Google Scholar
Amos CI (2007) Successful design and conduct of genome-wide association studies. Hum Mol Genet Spec 2: R220-R225
Article Google Scholar
Zondervan KT, Cardon LR, Kennedy SH (2002) What makes a good case-control study? Design issues for complex traits such as endometriosis. Hum Reprod 17: 1415–1423
Article PubMed Google Scholar
Newton-Cheh C, Hirschhorn JN (2005) Genetic association studies of complex traits: design and analysis issues. Mutat Res-Fund Mol M 573: 54–69
Article CAS Google Scholar
Clayton DG, et al (2005) Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet 37: 1243–1246
Article PubMed CAS Google Scholar
Plagnol V, et al (2007) A method to address differential bias in genotyping in large-scale association studies. PLOS Genet 3: e74
Article PubMed Google Scholar
Pluzhnikov A, et al. (2010) Spoiling the whole bunch: quality control aimed at preserving the integrity of high-throughput genotyping. Am J Hum Genet 87: 123–28
Article PubMed CAS Google Scholar
Tabor HK, Risch NJ, Myers RM (2002) Candidate-gene approaches for studying complex genetic traits: practical considerations. Nat Rev Genet 3: 391–7
Article PubMed CAS Google Scholar
Pettersson FH, et al. (2009) Marker selection for genetic case-control association studies. Nat Protoc 4: 743–752
Article PubMed CAS Google Scholar
Hirschhorn JN, Daly MJ. (2005) Genome-wide association studies for common diseases and complex traits, Nature Reviews Genetics 6: 95–108
Article PubMed CAS Google Scholar
Pahl R, Schafer H, Muller HH (2009) Optimal multistage designs-025EFa general framework for efficient genome-wide association studies. Biostatistics 10: 297–309
Article PubMed Google Scholar
Skol AD, et al. (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38: 209–213
Article PubMed CAS Google Scholar
Bowden J, Dudbridge F (2009) Unbiased Estimation and Inference for Replicated Associations Following a Genome Scan. Genet Epidemiol 33: 406–418
Article PubMed Google Scholar
Garner C (2007) Upward bias in odds ratio estimates from genome-wide association studies. Genet Epidemiol 31: 288–295
Article PubMed Google Scholar
Goldgar D, et al (2007) BRCA phenocopies or ascertainment bias? J Med Genet 44: 10–15
Google Scholar
Terwilliger JD, Weiss KM (2003) Confounding, ascertainment bias, and the blind quest for a genetic ‘fountain of youth’. Ann Med 35: 532–544
Article PubMed Google Scholar
Astle, W., and Balding, D. J. (2009) Population Structure and Cryptic Relatedness in Genetic Association Studies. Stat Sci 24: 451–471
Article Google Scholar
Voight BF, Pritchard JK (2005) Confounding from cryptic relatedness in case-control association studies. PLOS Genet 1: 302–311
Article CAS Google Scholar
Marchini J, et al (2004) The effects of human population structure on large genetic association studies. Nat Genet 36: 512–517
Article PubMed CAS Google Scholar
Choi Y, Wijsman EM, Weir BS (2009) Case-Control Association Testing in the Presence of Unknown Relationships. Genet Epidemiol 33: 668–678
Article PubMed Google Scholar
Slager SL, Schaid DJ (2001) Evaluation of candidate genes in case-control studies: A statistical method to account for related subjects. Am J Human Genet 68: 1457–1462
Article CAS Google Scholar
Bourgain C, et al (2003) Novel case-control test in a founder population identifies P-selectin as an atopy-susceptibility locus. Am J Hum Genet 73: 612–626
Article PubMed CAS Google Scholar
Pritchard JK, et al. (2000) Association mapping in structured populations. Am J Hum Genet 67: 170–181
Article PubMed CAS Google Scholar
Sillanpaa MJ (2011) Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses. Heredity 106(4):511–519
Google Scholar
Price AL, et al (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11: 459–463
Article PubMed CAS Google Scholar
Laird NM, Lange C (2009) The Role of Family-Based Designs in Genome-Wide Association Studies. Statist Sci 24: 388–397
Article Google Scholar

Download references

Author information

Authors and Affiliations

Section of Epidemiology and Biostatistics, Leeds Institute of Molecular Medicine, University of Leeds, Cancer Genetics Building, Leeds, UK
Jérémie Nsengimana & D. Timothy Bishop

Authors

Jérémie Nsengimana
View author publications
You can also search for this author in PubMed Google Scholar
D. Timothy Bishop
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jérémie Nsengimana .

Editor information

Editors and Affiliations

School of Medicine, Dept. Epidemiology & Biostatistics, Case Western Reserve University, Cornell Road 2103, Cleveland, 44106, Ohio, USA
Robert C. Elston
Dept. Epidemiology & Biostatistics, Memorial Sloan-Kettering Cancer Center, East 63rd Street 307, New York, 10021, New York, USA
Jaya M. Satagopan
School of Medicine, Dept. Epidemiology & Biostatistics, Case Western Reserve University, Cornell Road 2103, Cleveland, 44106, Ohio, USA
Shuying Sun

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Nsengimana, J., Bishop, D.T. (2012). Design Considerations for Genetic Linkage and Association Studies. In: Elston, R., Satagopan, J., Sun, S. (eds) Statistical Human Genetics. Methods in Molecular Biology, vol 850. Humana Press. https://doi.org/10.1007/978-1-61779-555-8_13

Download citation

DOI: https://doi.org/10.1007/978-1-61779-555-8_13
Published: 20 December 2011
Publisher Name: Humana Press
Print ISBN: 978-1-61779-554-1
Online ISBN: 978-1-61779-555-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics