Design Considerations for Genetic Linkage and Association Studies

Part of the Methods in Molecular Biology book series (MIMB, volume 1666)


This chapter describes the main issues that genetic epidemiologists usually consider in the design of linkage and association studies. For linkage, we briefly consider the situation of rare highly penetrant alleles showing a disease pattern consistent with Mendelian inheritance investigated through parametric methods in large pedigrees, or with autozygosity mapping in inbred families, and we then turn our focus to the most common design, the affected sibling pair design that is of more relevance for common, complex diseases. Power and sample size calculations are provided as a function of the strength of the genetic effect being investigated. We also discuss the impact of other determinants of statistical power such as disease heterogeneity, pedigree and genotyping errors and the effect of the type and density of genetic markers. For association studies, we consider the popular case–control design for dichotomous phenotypes and we provide power and sample size calculations for one-stage and multistage designs. For candidate genes, guidelines are given on the prioritization of genetic variants, and for genome-wide association studies (GWAS) the issue of choosing an appropriate SNP array is discussed. A warning is issued regarding the danger of designing an underpowered replication study following an initial GWAS. The risk of finding spurious association due to population stratification, cryptic relatedness, and differential bias is underlined.

Key words

Linkage Sib pairs Heterogeneity Marker density Association Power False positives Stratification Cryptic relatedness Differential bias 


  1. 1.
    Lee-Kirsch MA, Gong ML, Schulz H, Ruschendorf F, Stein A, Pfeiffer C, Ballarini A, Gahr M, Hubner N, Linne M (2006) Familial chilblain lupus, a monogenic form of cutaneous lupus erythematosus, maps to chromosome 3p. Am J Hum Genet 79:731–737CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Kruglyak L, Daly MJ, Reevedaly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58:1347–1363PubMedPubMedCentralGoogle Scholar
  3. 3.
    Lander ES, Botstein D (1987) Homozygosity mapping–a way to map human recessive traits with the dna of inbred children. Science 236:1567–1570CrossRefPubMedGoogle Scholar
  4. 4.
    Mueller RF, Bishop DT (1993) Autozygosity mapping, complex consanguinity, and autosomal recessive disorders. J Med Genet 30:798–799CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Wang S, Haynes C, Barany F, Ott J (2009) Genome-wide autozygosity mapping in human populations. Genet Epidemiol 33:172–180CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Boehnke M (1986) Estimating the power of a proposed linkage study–a practical computer-simulation approach. Am J Hum Genet 39:513–527PubMedPubMedCentralGoogle Scholar
  7. 7.
    Ploughman LM, Boehnke M (1989) Estimating the power of a proposed linkage study for a complex genetic trait. Am J Hum Genet 44:543–551PubMedPubMedCentralGoogle Scholar
  8. 8.
    Samani NJ, Burton P, Mangino M, Ball SG, Balmforth AJ, Barrett J, Bishop T, Hall A, Stribling J, De Souza P, Singh R, Ogleby J, Ridge C, Logtens E, Hopwood L, Faulkes J, Hall AS, Morrell C, Jackson BM, Barthorpe L, Burtonwood N, Dorsch M, Durham N, Forest C, Kelly N, Hall V, Lawrance R, Oldham J, Rennie E, Smith A, Thompson S, Adams S, Braund P, Clemitson JR, Bodycote C, Koekemoer A, Raleigh S, Maqbool A, Yuldasheva N, Ellis S, Mason S, Midgley L, Pleasants N, Cuthbert R, Tooze PF, Platts M, Fox J, Dixon R, Sheehan N, Scurrah K, Pickett S, Walters K, Nsengimana J, Group, The BHF Family Heart Study Research Group (2005) A genomewide linkage study of 1,933 families affected by premature coronary artery disease: British Heart Foundation (BHF) Family Heart Study. Am J Hum Genet 77:1011–1020Google Scholar
  9. 9.
    Whittemore AS, Tu IP (1998) Simple, robust linkage tests for affected sibs. Am J Hum Genet 62:1228–1242CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517CrossRefPubMedGoogle Scholar
  11. 11.
    Risch N (1990) Linkage strategies for genetically complex traits .2. The power of affected relative pairs. Am J Hum Genet 46:229–241PubMedPubMedCentralGoogle Scholar
  12. 12.
    Lander E, Kruglyak L (1995) Genetic dissection of complex traits–guidelines for interpreting and reporting linkage results. Nat Genet 11:241–247CrossRefPubMedGoogle Scholar
  13. 13.
    Bishop DT, Williamson JA (1990) The power of identity-by-state methods for linkage analysis. Am J Hum Genet 46:254–265PubMedPubMedCentralGoogle Scholar
  14. 14.
    Risch NJ (2000) Searching for genetic determinants in the new millennium. Nature 405:847–856CrossRefPubMedGoogle Scholar
  15. 15.
    Brown BD, Nsengimana J, Barrett JH, Lawrence RA, Steiner L, Cheng S, Bishop DT, Samani NJ, Ball SG, Balmforth AJ, Hall AS (2010) An evaluation of inflammatory gene polymorphisms in sibships discordant for premature coronary artery disease: the Grace-Immune study. BMC Med 8:5CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Hodge SE, Vieland VJ, Greenberg DA (2002) Hlods remain powerful tools for detection of linkage in the presence of genetic heterogeneity. Am J Hum Genet 70:556–558CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Whittemore AS, Halpern J (2001) Problems in the definition, interpretation, and evaluation of genetic heterogeneity. Am J Hum Genet 68:457–465CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Altmuller J, Palmer LJ, Fischer G, Scherb H, Wjst M (2001) Genomewide scans of complex human diseases: true linkage is hard to find. Am J Hum Genet 69:936–950CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Hauser ER, Watanabe RM, Duren WL, Bass MP, Langefeld CD, Boehnke M (2004) Ordered subset analysis in genetic linkage mapping of complex traits. Genet Epidemiol 27:53–63CrossRefPubMedGoogle Scholar
  20. 20.
    Nsengimana J, Samani NJ, Hall AS, Balmforth AJ, Mangino M, Yuldasheva N, Maqbool A, Braund P, Burton P, Bishop DT, Ball SG, Barrett JH, Group, T. B. F. H. S. R (2007) Enhanced linkage of a locus on chromosome 2 to premature coronary artery disease in the absence of hypercholesterolemia. Eur J Hum Genet 15:313–319CrossRefPubMedGoogle Scholar
  21. 21.
    Abecasis GR, Cherny SS, Cardon LR (2001) The impact of genotyping error on family-based analysis of quantitative traits. Eur J Hum Genet 9:130–134CrossRefPubMedGoogle Scholar
  22. 22.
    Abecasis GR, Cherny SS, Cookson WOC, Cardon LR (2001) GRR: graphical representation of relationship errors. Bioinformatics 17:742–743CrossRefPubMedGoogle Scholar
  23. 23.
    Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: causes, consequences and solutions. Nat Rev Genet 6:847–859CrossRefPubMedGoogle Scholar
  24. 24.
    Chang YPC, Kim JDO, Schwander K, Rao DC, Miller MB, Weder AB, Cooper RS, Schork NJ, Province MA, Morrison AC, Kardia SL, Quertermous T, Chakravarti A (2006) The impact of data quality on the identification of complex disease genes: experience from the family blood pressure program. Eur J Hum Genet 14:469–477CrossRefPubMedGoogle Scholar
  25. 25.
    Goring HHH, Ott J (1997) Relationship estimation in affected rib pair analysis of late-onset diseases. Eur J Hum Genet 5:69–77PubMedGoogle Scholar
  26. 26.
    Boehnke M, Cox NJ (1997) Accurate inference of relationships in sib-pair linkage studies. Am J Hum Genet 61:423–429CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Douglas JA, Boehnke M, Lange K (2000) A multipoint method for detecting genotyping errors and mutations in sibling-pair linkage data. Am J Hum Genet 66:1287–1297CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Sun L, Wilder K, Mcpeek MS (2002) Enhanced pedigree error detection. Hum Hered 54:99–110CrossRefPubMedGoogle Scholar
  29. 29.
    Sobel E, Papp JC, Lange K (2002) Detection and integration of genotyping errors in statistical genetics. Am J Hum Genet 70:496–508CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Ray A, Weeks DE (2008) Relationship uncertainty linkage statistics (RULS): affected relative pair statistics that model relationship uncertainty. Genet Epidemiol 32:313–324CrossRefPubMedGoogle Scholar
  31. 31.
    Hauser ER, Boehnke M, Guo SW, Risch N (1996) Affected-sib-pair interval mapping and exclusion for complex genetic traits: sampling considerations. Genet Epidemiol 13:117–137CrossRefPubMedGoogle Scholar
  32. 32.
    Sawcer SJ, Maranian M, Singlehurst S, Yeo TW, Compston A, Daly MJ, De Jager PL, Gabriel S, Hafler DA, Ivinson AJ, Lander ES, Rioux JD, Walsh E, Gregory SG, Schmidt S, Pericak-Vance MA, Barcellos L, Hauser SL, Oksenberg JR, Kenealy SJ, Haines JL, Int Multiple Sclerosis Genetics, C (2004) Enhancing linkage analysis of complex disorders: an evaluation of high-density genotyping. Hum Mol Genet 13:1943–1949CrossRefPubMedGoogle Scholar
  33. 33.
    Evans DM, Cardon LR (2004) Guidelines for genotyping in genomewide linkage studies: single-nucleotide-polymorphism maps versus microsatellite maps. Am J Hum Genet 75:687–692CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Guo XQ, Elston RC (2000) Two-stage global search designs for linkage analysis II: including discordant relative pairs in the study. Genet Epidemiol 18:111–127CrossRefPubMedGoogle Scholar
  35. 35.
    Huang QQ, Shete S, Amos CI (2004) Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis. Am J Hum Genet 75:1106–1112CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Schaid DJ, Guenther JC, Christensen GB, Hebbring S, Rosenow C, Hilker CA, Mcdonnell SK, Cunningham JM, Slager SL, Blute ML, Thibodeau SN (2004) Comparison of microsatellites versus single-nucleotide polymorphisms in a genome linkage screen for prostate cancer-susceptibility loci. Am J Hum Genet 75:948–965CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Nsengimana J, Renard H, Goldgar D (2005) Linkage analysis of complex diseases using microsatellites and single-nucleotide polymorphisms: application to alcoholism. BMC Genet 6:S10CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Wilcox MA, Pugh EW, Zhang HP, Zhong XY, Levinson DE, Kennedys GC, Wijsman EM (2005) Comparison of single-nucleotide polymorphisms and microsatellite markers for linkage analysis in the coga and simulated data sets for genetic analysis workshop 14: presentation groups 1, 2, and 3. Genet Epidemiol 29:S7–S28CrossRefPubMedGoogle Scholar
  39. 39.
    Boyles AL, Scott WK, Martin ER, Schmidt S, Li YJ, Ashley-Koch A, Bass MP, Schmidt M, Pericak-Vance MA, Speer MC, Hauser ER (2005) Linkage disequilibrium inflates type I error rates in multipoint linkage analysis when parental genotypes are missing. Hum Hered 59:220–227CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Abecasis GR, Wigginton JE (2005) Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. Am J Hum Genet 77:754–767CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Kurbasic A, Hossjer O (2008) A general method for linkage disequilibrium correction for multipoint linkage and association. Genet Epidemiol 32:647–657CrossRefPubMedGoogle Scholar
  42. 42.
    Webb EL, Sellick GS, Houlston RS (2005) Snplink: multipoint linkage analysis of densely distributed Snp data incorporating automated linkage disequilibrium removal. Bioinformatics 21:3060–3061CrossRefPubMedGoogle Scholar
  43. 43.
    Fukuda Y, Nakahara Y, Date H, Takahashi Y, Goto J, Miyashita A, Kuwano R, Adachi H, Nakamura E, Tsuji S (2009) Snp hitlink: a high-throughput linkage analysis system employing dense Snp data. Bmc Bioinformatics 10:121CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Selmer KK, Brandal K, Olstad OK, Birkenes B, Undlien DE, Egeland T (2009) Genome-wide linkage analysis with clustered Snp markers. J Biomol Screen 14:92–96CrossRefPubMedGoogle Scholar
  45. 45.
    Fischer A, Nothnagel M, Schürmann M, Müller-Quernheim J, Schreiber S, Hofmann S (2010) A genome-wide linkage analysis in 181 German sarcoidosis families using clustered bi-allelic markers. Chest 138:151–157CrossRefPubMedGoogle Scholar
  46. 46.
    Mccarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9:356–369CrossRefPubMedGoogle Scholar
  47. 47.
    Purcell S, Cherny SS, Sham PC (2003) Genetic power calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19:149–150CrossRefPubMedGoogle Scholar
  48. 48.
    WTCCC (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678CrossRefGoogle Scholar
  49. 49.
    Bishop DT, Demenais F, Iles MM, Harland M, Taylor JC, Corda E, Randerson-Moor J, Aitken JF, Avril MF, Azizi E, Bakker B, Bianchi-Scarra G, Bressac-De Paillerets B, Calista D, Cannon-Albright LA, Chin-A-Woeng T, Debniak T, Galore-Haskel G, Ghiorzo P, Gut I, Hansson J, Hocevar M, Hoiom V, Hopper JL, Ingvar C, Kanetsky PA, Kefford RF, Landi MT, Lang J, Lubinski J, Mackie R, Malvehy J, Mann GJ, Martin NG, Montgomery GW, Van Nieuwpoort FA, Novakovic S, Olsson H, Puig S, Weiss M, Van Workum W, Zelenika D, Brown KM, Goldstein AM, Gillanders EM, Boland A, Galan P, Elder DE, Gruis NA, Hayward NK, Lathrop GM, Barrett JH, Bishop JAN (2009) Genome-wide association study identifies three loci associated with melanoma risk. Nat Genet 41:920–925CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Amos CI (2007) Successful design and conduct of genome-wide association studies. Hum Mol Genet 2:R220–R225CrossRefGoogle Scholar
  51. 51.
    Zondervan KT, Cardon LR, Kennedy SH (2002) What makes a good case-control study? Design issues for complex traits such as endometriosis. Hum Reprod 17:1415–1423CrossRefPubMedGoogle Scholar
  52. 52.
    Newton-Cheh C, Hirschhorn JN (2005) Genetic association studies of complex traits: design and analysis issues. Mutat Res 573:54–69CrossRefPubMedGoogle Scholar
  53. 53.
    Spencer CCA, Su Z, Donnelly P, Marchini J (2009) Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet 5:E1000477CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    Morton NE, Collins A (1998) Tests and estimates of allelic association in complex inheritance. Proc Natl Acad Sci U S a 95:11389–11393CrossRefPubMedPubMedCentralGoogle Scholar
  55. 55.
    Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM, Smink LJ, Lam AC, Ovington NR, Stevens HE, Nutland S, Howson JMM, Faham M, Moorhead M, Jones HB, Falkowski M, Hardenbol P, Willis TD, Todd JA (2005) Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet 37:1243–1246CrossRefPubMedGoogle Scholar
  56. 56.
    Plagnol V, Cooper JD, Todd JA, Clayton DG (2007) A method to address differential bias in genotyping in large-scale association studies. PLoS Genet 3:E74CrossRefPubMedPubMedCentralGoogle Scholar
  57. 57.
    Pluzhnikov A, Below J, Konkashbaev A, Tikhomirov A, Kistner-Griffin E, Roe C, Nicolae D, Cox Nj (2010) Spoiling the whole bunch: quality control aimed at preserving the integrity of high-throughput genotyping. Am J Hum Genet 87:123–128CrossRefPubMedPubMedCentralGoogle Scholar
  58. 58.
    Tabor HK, Risch NJ, Myers RM (2002) Candidate-gene approaches for studying complex genetic traits: practical considerations. Nat Rev Genet 3:391–397CrossRefPubMedGoogle Scholar
  59. 59.
    Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108CrossRefPubMedGoogle Scholar
  60. 60.
    Pettersson FH, Anderson CA, Clarke GM, Barrett JC, Cardon LR, Morris AP, Zondervan KT (2009) Marker selection for genetic case-control association studies. Nat Protoc 4:743–752CrossRefPubMedPubMedCentralGoogle Scholar
  61. 61.
    Panoutsopoulou K, Zeggini E (2009) Finding common susceptibility variants for complex disease: past, present and future. Brief Funct Genomic Proteomic 8:345–352CrossRefPubMedPubMedCentralGoogle Scholar
  62. 62.
    Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, Mcgue M, Schlessinger D, Stambolian D, Loh PR, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C (2016) Next-generation genotype imputation service and methods. Nat Genet 48:1284–1287CrossRefPubMedPubMedCentralGoogle Scholar
  63. 63.
    Pahl R, Schafer H, Muller HH (2009) Optimal multistage designs general framework for efficient genome-wide association studies. Biostatistics 10:297–309CrossRefPubMedGoogle Scholar
  64. 64.
    Skol AD, Scott LJ, Abecasis GR, Boehnke M (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38:209–213CrossRefPubMedGoogle Scholar
  65. 65.
    Bowden J, Dudbridge F (2009) Unbiased estimation and inference for replicated associations following a genome scan. Genet Epidemiol 33(5):406–418CrossRefPubMedPubMedCentralGoogle Scholar
  66. 66.
    Garner C (2007) Upward bias in odds ratio estimates from genome-wide association studies. Genet Epidemiol 31:288–295CrossRefPubMedGoogle Scholar
  67. 67.
    Goldgar D, Venne V, Conner T, Buys S (2007) BRCA phenocopies or ascertainment bias? J Med Genet 44:10–15Google Scholar
  68. 68.
    Terwilliger JD, Weiss KM (2003) Confounding, ascertainment bias, and the blind quest for a genetic ‘fountain of youth’. Ann Med 35:532–544CrossRefPubMedGoogle Scholar
  69. 69.
    Astle W, Balding DJ (2009) Population structure and cryptic relatedness in genetic association studies. Stat Sci 24:451–471CrossRefGoogle Scholar
  70. 70.
    Voight BF, Pritchard JK (2005) Confounding from cryptic relatedness in case-control association studies. PLoS Genet 1:302–311CrossRefGoogle Scholar
  71. 71.
    Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects of human population structure on large genetic association studies. Nat Genet 36:512–517CrossRefPubMedGoogle Scholar
  72. 72.
    Choi Y, Wijsman EM, Weir BS (2009) Case-control association testing in the presence of unknown relationships. Genet Epidemiol 33:668–678CrossRefPubMedPubMedCentralGoogle Scholar
  73. 73.
    Slager SL, Schaid DJ (2001) Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. Am J Hum Genet 68:1457–1462CrossRefPubMedPubMedCentralGoogle Scholar
  74. 74.
    Bourgain C, Hoffjan S, Nicolae R, Newman D, Steiner L, Walker K, Reynolds R, Ober C, Mcpeek MS (2003) Novel case-control test in a founder population identifies p-selectin as an atopy-susceptibility locus. Am J Hum Genet 73:612–626CrossRefPubMedPubMedCentralGoogle Scholar
  75. 75.
    Sillanpaa M (2010) Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses. Heredity 106(4):511–519. doi: 10.1038/Hdy.2010.91 CrossRefPubMedPubMedCentralGoogle Scholar
  76. 76.
    Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459–463CrossRefPubMedPubMedCentralGoogle Scholar
  77. 77.
    Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909CrossRefPubMedGoogle Scholar
  78. 78.
    Laird NM, Lange C (2009) The role of family-based designs in genome-wide association studies. Statist Sci 24:388–397CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Section of Epidemiology and BiostatisticsLeeds Institute of Cancer and Pathology, University of LeedsLeedsUK

Personalised recommendations